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Abstract 

Multiset-CCG is a combinatory catego- 
rial formalism that can capture the syn- 
tax and interpretation of "free" word or- 
der in languages such as Turkish. The 
formalism compositionally derives the 
predicate-argument structure and the in- 
formation structure (e.g. topic, focus) 
of a sentence, and uniformly handles 
word order variation among arguments 
and adjuncts within a clause, as well 
as in complex clauses and across clause 
boundaries. 

1 Introduction 

In this paper, I present a categorial formalism, 
Multiset CCG (based on Combinatory Categorial 
Grammars (Steedman, 1985; Steedman, 1991)), 
that captures the syntax and context-dependent 
interpretation of "free" word order in languages 
such as Turkish. Word order variation in rela- 
tively free word order languages, such as Czech, 
Finnish, German, Japanese, Korean, Turkish, is 
used to convey distinctions in meaning that go 
beyond traditional propositional semantics. The 
word order in these languages serves to structure 
the information being conveyed to the hearer, e.g. 
by indicating what is the topic and the focus of the 
sentence (as will be defined in the next section). In 
fixed word order languages such as English, these 
are indicated largely through intonation and stress 
rather than word order. 

The context-appropriate use of "free" word or- 
der is of considerable importance in developing 
practical applications in natural language gener- 
ation, machine translation, and machine-assisted 
translation. I have implemented a database query 
system in Prolog, described in (Hoffman, 1994), 
which uses Multiset CCG to interpret and gen- 
erate Turkish sentences with context-appropriate 
word orders. Here, I concentrate on further devel- 



oping the formalism, especially to handle complex 
sentences. 

There have been other formalisms that inte- 
grate information structure into the grammar for 
"free" word order languages, e.g. (Sgall et al, 
1986; Engdahl/Vallduvi, 1994; Steinberger, 1994). 
However, I believe my approach is the first to 
tackle complex sentences with embedded infor- 
mation structures and discontinuous constituents. 
Multiset CCG can handle free word order among 
arguments and adjuncts in all clauses, as well 
as word order variation across clause boundaries, 
i.e. long distance scrambling. The advantage 
of using a combinatory categorial formalism is 
that it provides a compositional and flexible sur- 
face structure, which allows syntactic constituents 
to easily correspond with information structure 
units. A novel characteristic of this approach 
is that the context-appropriate use of word or- 
der is captured by compositionally building the 
predicate- argument structure (AS) and the infor- 
mation structure (IS) of a sentence in parallel. 

After presenting the motivating Turkish data 
in Section 2, I present a competence grammar for 
Turkish in Section 3 that captures the basic syn- 
tactic and semantic relationships between predi- 
cates and their arguments or adjuncts while al- 
lowing "free" word order. This grammar, which 
derives the predicate-argument structure is then 
integrated with the information structure in Sec- 
tion 4. In Section 5, the formalism is extended to 
account for complex sentences and long distance 
scrambling. 

2 The Turkish Data 

The arguments of a verb in Turkish (as well 
as many other "free" word order languages) do 
not have to occur in a fixed word order. For 
instance, all six permutations of the transitive 
sentence below are possible, since case-marking, 
rather than word order, serves to differentiate the 
arguments. 1 
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J The accusative, dative, genitive, ablative, and 
locative cases are associated with specific morphemes 



(1) a. Fatma Ahmet'i gordii. 

Fatma Ahmet- Acc see-Past. 
"Fatma saw Ahmet." 

b. Ahmet'i Fatma gordii. 

c. Fatma gordii Ahmet'i. 

d. Ahmet'i gordii Fatma. 

e. Gordii Ahmet'i Fatma. 

f. Gordii Fatma Ahmet'i. 

Although all the permutations have the same 
propositional interpretation, see (Fatma, Ahmet), 
each word order conveys a different discourse 
meaning only appropriate to a specific dis- 
course situation. We can generally associate the 
sentence-initial position with the topic, the im- 
mediately preverbal position with the focus which 
receives the primary stress in the sentence, and 
postverbal positions with backgrounded informa- 
tion (Erguvanli, 1984). The post- verbal positions 
are influenced by the given/new status of enti- 
ties within the discourse; postverbal elements are 
always evoked discourse entities or are inferrable 
from entities already evoked in the previous dis- 
course, and thus, help to ground the sentence in 
the current context. 

I define topic and focus according to their infor- 
mational status. A sentence can be divided into a 
topic and a comment, where the topic is the main 
element that the sentence is about, and the com- 
ment is the main information we want to convey 
about this topic. Assuming the hearer's discourse 
model or knowledge store is organized by topics, 
the sentence topic can be seen as specifying an 
"address" in the hearer's knowledge store (Rein- 
hart, 1982; Vallduvi, 1990). The informational 
focus is the most information-bearing constituent 
in the sentence, (Vallduvi, 1990); it is the new 
or important information in the sentence (within 
the comment), and receives prosodic prominence 
in speech. These information structure compo- 
nents are successful in describing the context- 
appropriate answer to database queries. In this 
domain, the focus is the new or important part of 
the answer to a wh-question, while the topic is the 
main entity that the question and answer are both 
about, that can be paraphrased using the clause 
"As for X". In other domains, finding the topic 
and focus of sentences according to the context 
may be more complicated. 

We can now explain why certain word orders 
are appropriate or inappropriate in a certain con- 
text, in this case database queries. For example, a 
speaker may use the SOV order in (2b) to answer 
the wh-question in (2a) because the speaker wants 
to focus the new object, Ahmet, and so places it 
in the immediately preverbal position. However, 
given a different wh-question in (3), the subject, 



(and their vowel-harmony variants) which attach to 
the noun; nominative case and subject-verb agreement 
for third person singular are unmarked. 



Fatma, is the focus of the answer, while Ahmet is 
the topic, a link to the previous context, and thus 
the OSV word order is used. 2 

(2) a. Fatma kimi gordii? 

Fatma who-Acc see-Past? 
"Who did Fatma see?" 
b. Fatma Ahmet'i gordii. SOV 
Fatma Ahmet- Acc see-Past. 
"Fatma saw AHMET." 

(3) a. Ahmet'i kim gordii? 

Ahmet- Acc who see-Past. 

"Who saw Ahmet?" 
b. Ahmet'i Fatma gordii. OSV 

Ahmet- Acc Fatma see-Past. 

"As for Ahmet, FATMA saw him." 
Adjuncts can also occur in different sentence 
positions in Turkish sentences depending on the 
context. The different positions of the sentential 
adjunct "yesterday" in the following sentences re- 
sult in different discourse meanings, much as in 
English. 

(4) a. Fatma Ahmet'i diin gordii. 

Fatma Ahmet- Acc diin see-Past. 
"Fatma saw Ahmet YESTERDAY." 

b. Diin Fatma Ahmet'i gordii. 
Yesterday Fatma Ahmet- Acc see-Past. 
"Yesterday, Fatma saw Ahmet." 

c. Fatma Ahmet'i gordii diin. 
Fatma Ahmet-Ace see-Past yesterday. 
"Fatma saw Ahmet, yesterday." 

Clausal arguments, just like simple NP argu- 
ments, can occur anywhere in the matrix sentence 
as long as they are case-marked, (5)a and b. Sub- 
ordinate verbs in Turkish resemble gerunds in En- 
glish; they take a genitive marked subject and are 
case-marked like NPs, but they assign structural 
case to the rest of their arguments like verbs. The 
arguments and adjuncts within most embedded 
clause can occur in any word order, also seen in 
(5)a and b. In addition, elements from the embed- 
ded clause can occur in matrix clause positions, 
i.e. long distance scrambling, (5c). As indicated 
by the translations, word order variation in com- 
plex sentences also affects the interpretation. 

Ayse [diin Fatma'nm gittigini] biliyor. 
Ayse [yest. Fatma-Gen go-Gerund-Acc] knows. 
"Ayse knows that yesterday, FATMA left." 

I.. 

[Diin gittigini Fatma'nin ] Ayse biliyor. 

[Yest. go-Gerund-Acc Fatma-Gen] Ayse knows. 
"It's AY§E who knows that she, Fatma, left YESTERDAY. 

c. 

Fatma'nin Ayse [diin gittigini] biliyor. 
Fatma-Gen Ayse [yest. go-Ger-Acc] knows. 
"As for Fatma, Ayse knows that she left YESTERDAY." 



2 In the English translations, the words in capitals 
indicate phonological focus. 



The information structure (IS) is distinct from 
predicate-argument structure (AS) in languages 
such as Turkish because adjuncts and elements 
long distance scrambled from embedded clauses 
can take part in the IS of the matrix sentence with- 
out taking part in the AS of the matrix sentence. 

As motivated from the data, a formalism for 
"free" word order languages such as Turkish must 
be flexible enough to handle word order varia- 
tion among the arguments and the adjuncts in all 
clauses, as well as the long distance scrambling 
of elements from embedded clauses. In addition, 
to capture the context-appropriate use of word 
order, the formalism must associate information 
structure components such as topic and focus with 
the appropriate sentence positions, regardless of 
the predicate-argument structure of the sentence, 
and be able to handle the information structure 
of complex sentences. In the next sections I will 
present a combinatory categorial formalism which 
can handle these characteristics of "free" word or- 
der languages. 

3 "Free" Word Order Syntax 

In Multiset-CCG 3 , we capture the syntax of free 
argument order within a clause by relaxing the 
subcategorization requirements of a verb so that 
it does not specify the linear order of its argu- 
ments. Each verb is assigned a function category 
in the lexicon which subcategorizes for a multi- 
set of arguments, without linear order restrictions. 
For instance, a transitive verb has the category 
S\{Nn, Na}, a function looking for a set of ar- 
guments, a nominative case noun phrase (Nn) and 
an accusative case noun phrase (Na), and result- 
ing in the category S, a complete sentence, once 
it has found these arguments in any order. 

The syntactic category for verbs provides no hi- 
erarchical or precedence information. However, it 
is associated with a propositional interpretation 
that does express the hierarchical ranking of the 
arguments. For example, the verb "see" is as- 
signed the lexical category S : see(X, Y)\{Nn : 
X,Na : Y}, and the noun "Fatma" is assigned 
Nn : Fatma, where the semantic interpretation 
is separated from the syntactic representation by 
a colon. These categories are a shorthand for the 
many syntactic and semantic features associated 
with each lexical item. The verbal functions can 
also specify a direction feature for each of their ar- 
guments, notated in the rules as an arrow above 
the argument. Thus, verb-final languages such as 
Korean can be modeled by using this direction 

feature in verbal categories, e.g. S\{Nn, Na}. 

Multiset-CCG contains a small set of rules that 
combine these categories into larger constituents. 
The following application rules allow a function 



such as a verbal category to combine with one of 
its arguments to its right (>) or left (<). We 
assume that a category X|0 where there are no 
arguments left in the multiset rewrites by a clean- 
up rule to just X . 

(6) a. Forward Application (>): 

X\(ArgsU{Y}) Y => X\Args 
b. Backward Application (<): 

Y X\(ArgsU{Y}) => X\Args 
Using these application rules, a verb can ap- 
ply to its arguments in any order. For exam- 
ple, the following is a derivation of a transi- 
tive sentence with the word order Object-Subject- 
Verb; variables in the semantic interpretations are 
italicized. 4 

( ? ) 

Ahmet'i Fatma gordii. 
Ahmet-Ace Fatma saw. 

Na:Ahmet Nn:Fatma S: see(X, Y)|{Nn:X,Na: Y} 

< 

S:see(Fatma, Y)| {Na: Y} 

-< 

S: see(Fatma, Ahmet) 
In fact, all six permutations of this sentence can 
be derived by the Multiset-CCG rules, and all 
are assigned the same propositional interpreta- 
tion, see(Fatma, Ahmet). 

The following composition rules combine two 
functions with set-valued arguments, e.g. two 
verbal categories, a verbal category and an ad- 
junct. 

(8) a. Forward Composition (>B): 

X\ (Args x U {y}) Y\ Argsy => X\ (Args x U Argsy) 

b. Backward Composition (<B): 

Y\ Argsy X\ (Args x U {y}) => X\ (Args x U Argsy) 

c. Restriction: Y ^ NP. 

Through the use of the composition rules, 
Multiset-CCGs can handle the free word order 
of sentential adjuncts. Adjuncts are assigned a 
function category S'|{S'} that can combine with 
any function that will also result in S, a complete 
sentence. The same composition rules allow two 
verbs to compose together to handle complex sen- 
tences with embedded clauses. This will be dis- 
cussed further in section 5. 

The restriction Y ^ NP on the Multiset- 
CCG composition rules prevents the categories for 

verbs, S\{NP}, and for adjectives, NP\{NP}, 
from combining together before combining with 
a bare noun. This captures the fact that simple 
NPs must be continuous and head-final in Turk- 
ish. Multiset CCG is flexible enough to handle 



3 A preliminary version of the syntactic component 
of the grammar was presented in (Hoffman, 1992). 



4 In my implementation of this grammar, DAG- 
unification is used in the rules. To improve the effi- 
ciency of unification and parsing, the arguments of the 
categories represented as DAGS are associated with 
feature labels that indicate their category and case. 



"free" word order languages that are freer than 
Turkish, such as Warlpiri, through the use of un- 
restricted composition rules, but it can also han- 
dle languages more restrictive in word order such 
such as Korean by restricting the categories that 
can take part in the composition rules. 

4 The Discourse Meaning of 
"Free" Word Order 

Word order variation in Turkish and other "free" 
word order languages is used to express the infor- 
mation structure of a sentence. The grammar pre- 
sented in the last section determines the predicate- 
argument structure of a sentence, regardless of 
word order. In this section, I add the ordering 
component of the grammar where the informa- 
tion structure of a sentence is determined. The 
simple compositional interface described below al- 
lows the AS and the IS of a sentence to be derived 
in parallel. This interface is very similar to Steed- 
man's approach in integrating prosody and syntax 
in CCGs for English (Steedman, 1991). 

A. Each Multiset-CCG category encoding syn- 
tactic and semantic properties in the AS is 
associated with an Ordering Category which 
encodes the ordering of IS components. 

B. Two constituents can combine if and only if 

i. their syntactic/semantic categories can 
combine using the Multiset-CCG appli- 
cation and composition rules, 

ii. and their Ordering Categories can com- 
bine using the rules below: 

Simple Forward Application (>): 
X/Y Y X. 

Simple Backward Application (<): 

Y X\Y ^X. 
Identity (=): X X => X 

Every verbal category in Multiset-CCG is as- 
sociated with an ordering category, which serves 
as a template for the IS. For example, the order- 
ing category in (9) is a function that specifies the 
components which must be found to complete a 
possible IS. The forward and backward slashes in 
the category indicate the direction in which the 
arguments must be found, and the parentheses 
around arguments indicate optionality. The vari- 
ables T, F,G1,G2 will be unified with the inter- 
pretations of the proper constituents in the sen- 
tence during the derivation. 
(9) 

1 1 (Ground: G2)\ Topic: T\ (Ground: Gl)\ Focus: F 
where I = 



Topic : 
Comment 



T 



Focus : 
Ground 



F 

[verb,Gl,G2 



on its left, then a ground constituent on its left, 
then a topic constituent on its left, and a ground 
constituent on its right. This function will result 
in a complete IS only if it finds the obligatory 
sentence-initial topic and the immediately prever- 
bal focus constituent; its other arguments (the 
ground) are optional and can be skipped during 
the derivation through a category rewriting rule, 
X|(Y) =>• X, that may apply after the applica- 
tion rules. 5 

Nonverbal elements are associated with simpler 
ordering categories, often just a variable which 
can unify with the topic, focus, or any other com- 
ponent in the IS template during the derivation. 
The identity rule allows two constituents with the 
same discourse function (often variables) to com- 
bine. These simpler ordering categories also con- 
tain a feature which indicates whether they rep- 
resent given or new information in the discourse 
model, which is dynamically checked during the 
derivation. Restrictions (such that elements to the 
right of the verb have to be discourse-old informa- 
tion in Turkish) are expressed as features on the 
arguments of the verbal ordering functions. 

What is novel about this formalism is that the 
predicate-argument structure and the information 
structure of a sentence are built in parallel in a 
compositional way. For example, given the fol- 
lowing question, we may answer in a word order 
which indicates that "today" is the topic of the 
sentence, and "Little Ahmet" is the focus. The 
derivation for this answer is seen in Figure 1. 
(10) a. Bugiin kimi gorecek Fatma? 

Today who-Acc see-Fut Fatma? 
"As for today, who will Fatma see?" 

b. 

Bugiin kiiciik Ahmet 'i gorecek Fatma. 

Today little Ahmet-Ace see-Fut Fatma. 

"Today, she, Fatma, will see Little AHMET." 
In Figure 1, every word in the sentence is associ- 
ated with a lexical category right below it, which is 
then associated with an ordering category in the 
next line. Parallel lines indicate the application 
of rules to combine two constituents together; the 
first line is for combining the syntactic categories, 
and the second line is for combining the ordering 
categories of the two constituents. The syntac- 
tic constituents are allowed to combine to form a 
larger constituent, only if their pragmatic coun- 
terparts (the ordering categories) can also com- 
bine. Thus, the derivation reflects the single sur- 
face structure for the sentence, while composition- 
ally building the AS and the IS of the sentence in 



The function above can use the simple application 
rules to first combine with a focused constituent 



5 Another IS is available where the topic component 
is marked as "inferrable" , for those cases where the 
topic is a zero pronoun instead of an element which is 
realized in the sentence. After the derivation is com- 
plete, further discourse processing infers the identity 
of the unrealized topic from among the salient entities 
in the discourse model. 



(11) 

Bugiin Kiiciik Ahmet 'i gordii 

Today little Ahmet-Ace saw 



Fatma. 
Fatma. 



S:today( J P)|{S: J P} Nx:\itt\e(Z)/Nx:Z Na:Ahmet S: see(X, Y)|{Nn:X, Na:Y} Nn:Fatma 
X:today Ydittle ^:Ahmet 7/(Grnd2)\Top\(Grndl)\Foc W^ e „ :+ :Fatma 

> 



AS = Na:little(Ahmet) 
IS = Y: [little,Ahmet] 

< 

-<,skip 

AS = S:see(A,little(Ahmet)) | { Nn:A} 

IS = [Focus: [little, Ahmet], Ground:see]/(Grnd2)\ Top 

>B 

-< 

AS = S: today(see(X,little(Ahmet))) | { Nn:X} 

IS = [Topic: today, Focus: [little, Ahmet], Ground:see]/(Grnd2) 

-> 

> 

AS = S: today(see(Fatma, little(Ahmet))) 

IS = [Topic: today, Focus: [little, Ahmet], Ground: [see, Fatma]] 
Figure 1: Deriving the Predicate- Argument and Information Structure for a Simple Sentence. 



parallel. 

Using this formalism, I have implemented 
a database query system (Hoffman, 1994) 
which generates Turkish sentences with context- 
appropriate word orders, in answer to database 
queries. In generation, the same topic found 
in the database query is maintained in the an- 
swer. For wh-questions, the information that is 
retrieved from the database to answer the ques- 
tion becomes the focus of the answer. I have ex- 
tended the system to also handle yes-no questions 
involving the question morpheme "mi" , which is 
placed next to whatever element is being ques- 
tioned in the sentence. If the verb is being ques- 
tioned, this is a cue that the assertion or nega- 
tion of the verb will be the focus of the answer: 
(12) a. Ahmet'i Fatma gordii mii? 

Ahmet- Acc Fatma see-Past Quest. 

"As for Ahmet, did Fatma SEE him?" 

b. Hayir, Ahmet 'iy Fatma [GORmedi]^ . 
No, Ahmet- Acc Fatma see-Neg-Past. 
"No, (as for Ahmet) Fatma did NOT see him. 
In most Turkish sentences, the immediately pre- 
verbal position is prosodically prominent, and this 
corresponds with the informational focus. How- 
ever, verbs can be focused in Turkish by placing 
the primary stress of the sentence on the verb in- 
stead of immediately preverbal position and by 
lexical cues such as the placement of the question 
morpheme. Thus, we must have more than one 
IS available for verbs, where verbs can be in the 
focus or the ground component of the IS. In ad- 



dition, it is possible to focus the whole VP or the 
whole sentence, which can be determined by the 
context, in this case the database query: 
(13) a. Bugiin Fatma ne yapacak? 
Today Fatma what do-Fut? 
"What's Fatma going to do today?" 

b. 

Bugiin Fatma [kitap okuyacak]^ . 

Today Fatma book read-fut. 

"Today, Fatma is going to [read a BOOK]^ 

In yes/no questions, if a non-verbal element is 
being focused by the question morpheme and the 
answer is no, the system provides a more natu- 
ral and helpful answer by replacing the focus of 
the question with a variable and searching the 
database for an alternate entity that satisfies the 
rest of the question. 

Thus, Multiset CCG allows certain pragmatic 
distinctions to influence the syntactic construction 
of the sentence using a lexicalized compositional 
method. In addition, it provides a uniform ap- 
proach to handle word order variation among ar- 
guments and adjuncts, and as we will see in the 
next section, across clause boundaries. 

5 Complex Sentences 

5.1 Embedded Information Structures 

As in matrix clauses, arguments and adjuncts 
in embedded clauses can occur in any order. 
To capture the interpretation of the word order 
within embedded clauses, my formalism allows for 
embedded information structures. Subordinate 



verbs, just like matrix verbs, are associated with 
an ordering category which determines the infor- 
mation structure for the clause. When the sub- 
ordinate clause syntactically combines with the 
matrix clause, the IS of the subordinate clause 
is embedded into the IS of the matrix clause. For 
example, in the complex sentence and its IS be- 
low, the embedded clause is the topic of the matrix 
clause since it occurs in the sentence-initial posi- 
tion of the matrix clause. The word order vari- 
ation within the embedded clause indicates the 
structure of the IS that is embedded under topic. 

(14) a. [Diin Fatma'nin gittigini] Ayse biliyor. 
[Yest. Fatma-Gen go-Ger-Acc] Ayse knows. 
"It's AY§E who knows that yesterday, FAT MA left 
Topic : yesterday 

Focus : Fatma 



Topic 



Comment 



Comment 



Gr 



Focus : Ayse 
Ground : know 



To ensure that the embedded IS is complete be- 
fore it is placed into the matrix clause's IS, we re- 
strict the application rules (e.g. X/Y Y =>• X) 
in the ordering component of Multiset-CCG; we 
stipulate that the argument Y must not be a func- 
tion (with arguments left to find). The restriction 
ensures that the ordering category for the embed- 
ded verb is no longer a function, that it has found 
all of its obligatory components and skipped all 
the optional ones before combining with the ma- 
trix verb's ordering category. 

5.2 Long Distance Scrambling 

In Turkish complex sentences with clausal ar- 
guments, elements of the embedded clauses can 
occur in matrix clause positions, i.e. long dis- 
tance scrambling. However, speakers only use 
long distance scrambling for specific pragmatic 
functions. Generally, an element from the em- 
bedded clause can occur in the sentence initial 
topic position of the matrix clause (e.g. (15)b) or 
to the right of the matrix verb as backgrounded 
information (e.g. (15)d), but cannot occur in 
the stressed immediately preverbal position (e.g. 
(15)c). This long distance dependency is sim- 
ilar to the English topicalization construction. 

(15) a. Ayse [Fatma'nin diin gittigini] biliyor. 

Ayse [Fatma-Gen yesterday go-Ger-Acc] knows. 
"Ayse knows that Fatma left yesterday." 

b. Fatma'nin Ayse [diin gittigini] biliyor. 
Fatma-Gen Ayse [yest. go-Ger-Acc] knows. 

c. *Ayse [diin gittigini] FATMA 'nm biliyor. 
*Ayse [yest. go-Ger-Acc] Fatma-Gen knows. 

d. Ayse [diin gittigini] biliyor Fatma'nin. 
Ayse [yest. go-Ger-Acc] knows Fatma-Gen. 



Multiset-CCG can recover the appropriate 
predicate-argument relations of the embedded 
clause and the matrix clause even when the ar- 
guments occur out of the domain of the subordi- 
nate verb. The composition rules allow two verb 
categories with multisets of arguments to combine 
together. As the two verbs combine, their argu- 
ments collapse into one argument set in the syn- 
tactic representation. As seen in the derivation 
below, we compose the verbs together to form a 
complex verbal function, which can then apply to 
the arguments of both verbs in any order. 

( 16 L. . 

gittigini biliyor 
" go-gerund-acc knows 

Sjva : go(y)\{Ng:y} S:know(x,p)\{Nn: x,S na : p} 

<B 

S : know(x, go(y))\ {Nn : x, Ng : y } 
Although the verbs' argument sets are collapsed 
into one set, their respective arguments are still 
distinct within the semantic representation of the 
sentence. The propositional interpretation of the 
subordinate clause is embedded into the interpre- 
tation of the matrix clause. 

The syntactic component of Multiset-CCGs 
correctly rules out long distance scrambling to the 
immediately preverbal matrix position, because 
elements from the embedded clause cannot com- 
bine with the matrix verb before the matrix verb 
has combined with the embedded verb. 
(17) 

* [Gittigini] Ayse 

* [Go-Ger-Acc] Ayse 
Siv a |{Ng,Na} Nn' 



Fatma'nin biliyor. 
Fatma-Gen know-Pres. 
Ng S|{Nn, S Na } 
XXX 



Long distance scrambling to the sentence initial 
position and post-verbal position in the matrix 
clause is handled through the composition of the 
verbs, as seen in Figure 2. 

The ordering component of Multiset CCG al- 
lows individual elements from subordinate clauses 
to be components in the IS of the matrix clause. 
This is because the ordering category for a ma- 
trix verb does not specify that its components be 
arguments in its AS. In the sentence in Figure 2, 
"Fatma", an argument of the embedded clause, 
has been scrambled into the topic position of the 
matrix clause. The derivation with both compo- 
nents of the grammar working in parallel is shown 
in Figure 2. The embedded verb must first com- 
plete its IS (IS2)', then, the two verbs compose to- 
gether, and the subordinate IS is embedded into 
the matrix IS (I Si). The complex verbal con- 
stituent can then combine with the rest of the 
arguments of both verbs in any order. The lin- 
ear order of the two NP arguments will determine 
which components of the matrix IS each fill. Note 
that "Fatma" is an argument in the interpretation 
of the embedded verb "go" , not the matrix verb 
"know" , but it plays the role of topic in the matrix 



Fatma'nin Ayse 
Fatma-Gen Ayse 



[dun gittigini] biliyor. 

[yesterday go-Ger-Acc] know-Pres. 



Ng:Fatma 
X:Fatma 



Nn:Ayse S:yest( J P)|{S: J P} S^: go(X) \ { Ng:X } S: know(Y^) | { Nn:Y, S Na : Z} 

Y:Ayse FF:yesterday 75 2 /(Grnd2)\(Top)\(Grndl)\Foc 75i/(Grnd2)\Top\(Grndl)\Foc 

>B 

<,skip3 

AS = S Na : yesterday(go(Z)) | { Ng:X} 
IS2 = [Topic:inferrable, Focus:yesterday, Ground:go] 



AS = S: know(y,yesterday(go(Z))) | { Nn:Y, Ng:X} 



ISi 



Topic 



Comment 



Top 

Focus : 7^2 



Topic : 
Comment 



inferrable 

Focus : yest. 
Ground : go 



Ground : [know, Grndl, Grnd2] 



AS = S: know(Ayse,yesterday(go(X))) | { Ng:X} 
IS = lS 1 /(Grnd2)\Top 



< 
< 



IS 



AS = S: know(Ayse,yesterday(go(Fatma))) 
Topic : Fatma 



Comment 



"Topic : 
Focus . Comment : 

Ground : [Ayse, know] 



inferrable 

Focus : yesterday 
Ground : go 



-<B 

— < 



/(Grnd2)\Top\(Grndl) 



-< 
-< 



Figure 2: Derivation for the AS and IS of a Complex Sentence. 



verb's IS. Thus, adjuncts and elements from em- 
bedded clauses can play a role in the information 
structure of the matrix clause, although they do 
not belong to the same predicate-argument struc- 
ture. 

5.3 Islands 

The syntactic component of Multiset-CCGs can 
derive a string of any number of scram- 
bled NPs followed by a string of verbs: 
(NPi...NP m ) scramJ ; ed V m ... Vi, where each verb, 
Vi, subcategorizes for NP;. The more one scram- 
bles things, the harder the sentence is to process, 
but there is no clear cut-off point in which the 
scrambled sentences become ungrammatical for 
native speakers. Thus, I claim that processing 
limitations and pragmatic purposes, rather than 
syntactic competence, restrict such scrambling. 

However, some types of clauses, in some "free" 
word order languages, act as islands that strictly 
do not allow long distance scrambling. In other 
"free" word order languages, such as Turkish, it is 



very hard to find island effects. As seen in the first 
example in Figure 3, even elements from relative 
clauses can be extracted. However, it is harder to 
extract elements from some adjunct clauses which 
do not have close semantic links to the matrix 
clause. To account for clauses exhibiting island 
behaviour, we can assign the head of the clause a 
category such as 5'|5'|{#n, Na} which makes cer- 
tain that the head combines with all of its NP ar- 
guments before combining with the matrix clause, 
S. As demonstrated in (19)c in Figure 3, long dis- 
tance scrambling out of such an adjunct clause is 
thus prohibited. 

In contrast, heads of adjunct clauses which 
are not islands are assigned categories such as 
S\{S, Nn, Na}. Since this category can combine 
with the matrix verb even before it has combined 
with all of its arguments, it allows long distance 
scrambling of its arguments. This lexical control 
of the behaviour is very advantageous for captur- 
ing Turkish, since not every adjunct clause is an 
island in Turkish. However, further research is 



(18) Ankara'darii sen [e; dun gelen] adami taniyor musun? 

Ankara- Ahli you [e; yest. come-Rel] man-Ace know Quest-2Sg? 
"Do you know the man who came yesterday from Ankara?" 

(19) a. [Berna odevini bitirince] bana yardim edecek. 

[Berna hw-3Ps-Acc finish-ger] I-dat help do-3Sg. 

"When Berna finishes (her) homework, (she) is going to help me." 

b. * [Berna bitirince] bana yardim edecek odevini. 
*[Berna finish-ger] I-dat help do hw-3Ps-Acc. 

c. *Berna finish-ger I-dat help do hw-3Ps-Acc 

Nn S\S\{Nn,Na} S Na 

< 

S\S\{Na) 

XXX -XXX 

Figure 3: Long Distance Scrambling Out of Adjunct Clauses 



needed to determine what types of adjunct clauses 
exhibit island behaviour in order to specify the ap- 
propriate categories in the lexicon. 

6 Conclusions 

I have presented a combinatory categorial formal- 
ism that can account for both the syntax and in- 
terpretation of "free" word order in Turkish. The 
syntactic component of Multiset CCG is flexible 
enough to derive the predicate-argument structure 
of simple and complex sentences without relying 
on word order, and it is expressive enough to cap- 
ture syntactic restrictions on word order in dif- 
ferent languages such as languages with NP or 
clausal islands or languages which allow discon- 
tinuous NPs or clauses. Word order is used in 
the ordering component of Multiset CCG to de- 
termine the information structure of a sentence. 
Every Multiset CCG category encoding syntac- 
tic and semantic properties is associated with an 
ordering category which encodes the ordering of 
information structure components such as topic 
and focus; two syntactic/semantic categories are 
allowed to combine to form a larger constituent 
only if their ordering categories can also combine. 
The formalism has been implemented within a 
database query task in Quintus Prolog, to inter- 
pret and generate simple and complex sentences 
with context-appropriate word orders. 

Multiset CCG captures the context-appropriate 
use of word order by compositionally deriving the 
predicate-argument structure and the information 
structure of a sentence in parallel. It allows ad- 
juncts and elements from embedded clauses to 
take part in the information structure of the ma- 
trix clause, even though they do not take part in 
its predicate-argument structure. Thus, this for- 
malism provides a uniform approach in capturing 
the syntactic and pragmatic aspects of word or- 
der variation among arguments and adjuncts, and 
across clause boundaries. 



References 

Elisabet Engdahl and Enric Vallduvi. Informa- 
tion Structure and Grammar Architecture, pre- 
sented at NELS, University of Pennsylvania, 
1994. 

Eser Emine Erguvanli. The Function of Word Or- 
der in Turkish Grammar. University of Califor- 
nia Press, 1984. UCLA PhD dissertation 1979. 

Beryl Hoffman. A CCG Approach to Free Word 
Order Languages. In the Proceedings of the 30th 
Annual Meeting of the ACL, Student Session, 
1992. 

Beryl Hoffman. Generating Context- Appropriate 
Word Orders in Turkish. In the Proceedings of 
the International Workshop on NL Generation, 
1994. 

Tanya Reinhart. Pragmatics and Linguistics: An 
Analysis of Sentence Topics. Phtlosophtca 27, 
53-94, 1982. 

Petr Sgall, Eva Hajicova, and J. Panevova. The 
Meaning of the Sentence and its Semantic and 
Pragmatic Aspects. Dordrecht: Reidel; Prague: 
Academia, 1986. 

Mark Steedman. Dependencies and Coordination 
in the Grammar of Dutch and English. Lan- 
guage, 61:523-568, 1985. 

Mark Steedman. Structure and Intonation. Lan- 
guage, 67:260-296, 1991. 

Ralf Steinberger. Treating Free Word Order 
in Machine Translation. Coltng 1994, Kyoto, 
Japan. 

Enric Vallduvi. The Informational Component. 
PhD thesis, University of Pennsylvania, 1990. 



